Using a Hidden-Markov Model in Semi- Automatic Indexing of Historical Handwritten Records

نویسندگان

  • Thomas Packer
  • Oliver Nina
  • Ilya Raykhel
چکیده

Indexing of historical records is a process that uses human effort to read text images and convert them into a machine readable format that facilitates search. The Church of Jesus Christ of Latter-day Saints has been using volunteers to index millions of microfilm images of genealogy records collected throughout the world. This indexing process is time-consuming. We adapt a technique for holistic handwritten word recognition originally published by Victor Lavrenko et al., and use it to semi-automatically index US census record images to improve the efficiency of the manual indexing process used in the LDS Church's "Internet Indexing" project. The data used for this project differs from Lavrenko's paper in that we recognized words in three columns of a structured census image instead of unconstrained handwritten words. The approach resulted in 90% accuracy for the chosen columns using as little as one page of manually indexed training data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Holistic Farsi handwritten word recognition using gradient features

In this paper we address the issue of recognizing Farsi handwritten words. Two types of gradient features are extracted from a sliding vertical stripe which sweeps across a word image. These are directional and intensity gradient features. The feature vector extracted from each stripe is then coded using the Self Organizing Map (SOM). In this method each word is modeled using the discrete Hidde...

متن کامل

A Multimodal Approach to Dictation of Handwritten Historical Documents

Handwritten Text Recognition is a problem that has gained attention in the last years due to the interest in the transcription of historical documents. Handwritten Text Recognition employs models that are similar to those employed in Automatic Speech Recognition (Hidden Markov Models and n-grams). Dictation of the contents of the document is an alternative to text recognition. In this work, we ...

متن کامل

Abnormality Detection in a Landing Operation Using Hidden Markov Model

The air transport industry is seeking to manage risks in air travels. Its main objective is to detect abnormal behaviors in various flight conditions. The current methods have some limitations and are based on studying the risks and measuring the effective parameters. These parameters do not remove the dependency of a flight process on the time and human decisions. In this paper, we used an HMM...

متن کامل

Comparison of Two Different Feature Sets for Offline Recognition of Handwritten Arabic Words

Normalization is a very important step in automatic cursive handwritten word recognition. Based on an offline recognition system for Arabic handwritten words which uses a semi-continuous 1-dimensional HMM recognizer two different feature sets are presented. The dependencies of the feature sets from normalization steps is discussed and their performances are compared using the IFN/ENIT database ...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009